An Analysis of NBA Player Performance Using Principal Component Analysis

Jordan Iserman, Jeshurun Moses, and Harsha Pola

Introduction

  • What is principal component analysis (PCA)?
  • When and why was PCA first proposed?
  • What dataset is being used?

Methods

  • Data must be standardized
  • Covariance matrix constructed
  • Find eigenvalues and eigenvectors
  • Create principal components

Methods: Standardization

  • PCA creates linear combinations (t) of variables

\[t = c_{1}X_{1} + ... + c_{n}X_{n}; t = Xc\]

  • X is the data set as a matrix
  • c is a vector of different variables
  • Maximum variance is the goal \[\underset{||c||=1}{argmax}(var(t)) \]

Methods: Covariance Matrix and Eigenvectors

  • Covariance between two variables, X and Y, is calculated as follows: \[Cov(X,Y) = E[(X − E[X])(Y − E[Y])]\]
  • A covariance matrix can be constructed
\[\begin{bmatrix} Cov(x,x) & Cov(x,y) & Cov(x,z)\\ Cov(y,x) & Cov(y,y) & Cov(y,z)\\ Cov(z,x) & Cov(z,y) & Cov(z,z) \end{bmatrix}\]

Data: Description

Variables Description
GP Games played
MPG Minutes per game
FTA, att2P, att3P The number of shot attempts
FTpct, pct2P, pct3p The percentage of shots made
eFGpct Effective Field Goal Percentage
PPG, RPG, APG, SPG, BPG The average points, rebounds, assists, steals, and blocks.
ORTG, DRTG Offensive/Defensive Rating

Data: Histograms

Data: Boxplots

Data: Correlation Matrix

Data: Best Performers

Data: Best Performers

Data: Best Performers

Analysis: Eigenvalues

Analysis: Variables

Analysis: What Makes Each PC

Analysis: What Makes Each PC

Analysis: What Makes Each PC

Analysis: What Makes Each PC

Analysis: Which Players Contribute Most

Analysis: Which Players Contribute Most

Analysis: Which Players Contribute Most

Analysis: Which Players Contribute Most

Analysis: Biplot

Challenges

  • Standardization
  • Loss of information
  • Impact of outliers

Conclusion

  • Utilizing PCA for a holistic view of player-performance relationships

  • Harnessing diverse visualizations to illuminate player-metric connections

  • Identifying correlations and eliminating redundancies among variables

  • Understanding main axes (principal components) through eigenvectors

  • Showcasing PCA as a powerful unsupervised learning method